December  1996 


LIDS-P-2375 


Research  Supported  By: 

Army  Research  Office  (DAAL-03-92-G-1 15) 
Air  pOTce  Office  of  Scientific  Research 
(F49620-95-1-0083  and  BU  GC12391NGD) 
D.  Tucker 


ROBUST  WAVELET  THRESHOLDING  FOR  NOISE 

SUPPRESSION 


L  C.  Schick 


H.  Krim 


Report  Documentation  Page 

Form  Approved 

0MB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  0MB  control  number. 

1.  REPORT  DATE 

DEC  1996 

2.  REPORT  TYPE 

3.  DATES  COVERED 

00-12-1996  to  00-12-1996 

4.  TITLE  AND  SUBTITLE 

5a.  CONTRACT  NUMBER 

Robust  Wavelet  Thresholding  for  Noise  Suppression 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Massachusetts  Institute  of  Technology, Laboratory  for  Information  and 
Decision  Systems, 77  Massachusetts  Avenue, Cambridge, MA, 02139-4307 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR’S  ACRONYM(S) 

11.  SPONSOR/MONITOR’S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 
ABSTRACT 

18.  NUMBER 

OF  PAGES 

5 

19a.  NAME  OF 
RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Standard  Form  298  (Rev.  8-98} 

Prescribed  by  ANSI  Std  Z39-18 


Robust  Wavelet  Thresholding  for  Noise  Suppression 

I.  C.  Schick\  and  H.  KrimX 

fNetwork  Engineering,  BBN;  DBAS,  Harvard  University,  Cambridge,  MA  02138 

schick@bbn.com 

^Stochastic  Systems  Group,  LIDS,  MIT,  Cambridge,  MA  02139 

ahk@mit.edu 


Abstract 

Approaches  to  wavelet-based  denoising  (or  signal  en¬ 
hancement)  have  so  far  relied  on  the  assumption  of  nor¬ 
mally  distributed  perturbations.  To  relax  this  assump¬ 
tion,  which  is  often  violated  in  practice,  we  derive  a  robust 
wavelet  thresholding  technique  based  on  the  Minimax  De¬ 
scription  Length  principle.  We  first  determine  the  least 
favorable  distribution  in  the  e-contaminated  normal  fam¬ 
ily  as  the  member  that  maximizes  the  entropy.  We  show 
that  this  distribution  and  the  best  estimate  based  upon 
it,  namely  the  Maximum  Likelihood  Estimate,  constitute 
a  saddle  point.  This  results  in  a  threshold  that  is  more 
resistant  to  heavy-tailed  noise,  but  for  which  the  estima¬ 
tion  error  is  still  potentially  unbounded.  We  address  the 
practical  case  where  the  underlying  signal  is  known  to  be 
bounded,  and  derive  a  two-sided  thresholding  technique 
that  is  resistant  to  outliers  and  has  bounded  error.  We 
provide  illustrative  examples. 


1  Introduction 

The  concept  of  “scale”  has  emerged  in  recent  years  as 
an  important  characteristic  for  signal  analysis,  par¬ 
ticularly  with  the  advent  of  wavelet  theory. 

Wavelets  provide  a  powerful  tool  for  non-linear  fil¬ 
tering  of  signals  contaminated  by  noise.  Mallat  and 
Hwang  [1]  have  shown  that  effective  noise  suppres¬ 
sion  may  be  achieved  by  transforming  the  noisy  sig¬ 
nal  into  the  wavelet  domain,  and  preserving  only  the 
local  maxima  of  the  transform.  Alternatively,  a  re¬ 
construction  that  uses  only  the  large-magnitude  co¬ 
efficients  has  been  shown  to  approximate  well  the  un¬ 
corrupted  signal.  In  other  words,  noise  suppression 
is  achieved  by  thresholding  the  wavelet  transform  of 
the  contaminated  signal. 

To  choose  the  appropriate  threshold,  Donoho  and 
Johnstone  [2]  have  taken  a  minimax  approach  to 
characterizing  the  signal  (rather  than  the  distur¬ 
bance,  which  they  assume  to  be  Gaussian).  They 
derived  a  threshold  that  is  approximately  minimax 
(in  the  sense  that  its  sample  size  dependence  is  of 
the  same  order  as  that  of  the  true  minimax):  a  co¬ 


efficient  Cj  is  excluded  from  the  reconstruction  if 
I  Ci  |<  a^/2  log  iiT,  where  a  is  the  standard  devia¬ 
tion  of  the  noise,  and  K  is  the  length  of  the  obser¬ 
vation.  Krim  and  Pesquet  [3]  have  used  Rissanen’s 
Minimum  Description  Length  (MDL)  criterion  [4]  to¬ 
gether  with  the  assumption  of  normally  distributed 
noise,  and  derived  an  identical  threshold.  Another 
feature  that  makes  this  threshold  compelling  is  that 
it  is  asymptotically  equivalent  to  the  maximum  of  a 
sample  of  independent  normally  distributed  variates 
[5],  suggesting  the  intuitively  pleasing  interpretation 
that  anything  larger  in  magnitude  is  extremely  un¬ 
likely  to  be  pure  noise  and  must  therefore  contain 
signal. 

Nevertheless,  the  procedure  remains  non-robust.  Al¬ 
though  wavelets,  thanks  to  their  compactness  and  lo¬ 
calization  properties,  do  provide  an  unconditional  ba¬ 
sis  for  a  large  smoothness  class  of  signals  and  offer  a 
simple  framework  for  nonlinear  filtering,  the  proce¬ 
dures  derived  to  date  have  been  based  upon  the  as¬ 
sumption  of  normality  of  the  noise,  and  are  therefore 
sensitive  to  outliers,  i.e.  to  noise  distributions  whose 
tails  are  heavier  than  the  Gaussian  distribution.  In 
this  paper  we  adopt  the  minimax  approach  due  to 
Huber  [6]  to  derive  a  thresholding  technique  that  is 
resistant  to  spurious  observations. 


2  Problem  Statement 

The  estimation  problem  of  interest  in  this  paper  as¬ 
sumes  the  following  observation  model: 

x{t)  =  s{t)  +  n{t) ,  t=l,...,K,  (1) 

with  x{t)  G  L^(E),  and  where  s{t)  is  a  deterministic 
but  unknown  signal  corrupted  by  the  noise  process 
n{t). 

In  nonpar ametric  estimation,  the  underlying  signal 
model  is  often  assumed  to  be  induced  by  an  orthonor¬ 
mal  basis  representation, 

=  (2) 
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which  in  turn  leads  to  the  working  model 

Ci  =  Cl  +  C^,  i  =  ,K,  (3) 

where  the  independent  noise  component  has  the  same 
statistical  properties  as  n{t).  Our  problem  is  to  re¬ 
cover/reconstruct  s{t)  from  the  orthogonal  transform 
of  the  observed  process  of  x{t).  This  can  be  achieved 
by  using  the  MDL  principle  to  determine  which  co¬ 
efficients  Ci  contain  signal  information,  and  which 
are  primarily  noise  and  can  therefore  be  left  out  of 
the  reconstruction.  However,  since  MDL  is  the  max¬ 
imum  log  likelihood  minus  a  penalty  term  propor¬ 
tional  to  the  number  of  parameters  (i.e.  the  number 
of  signal-containing  coefficients  Ci)  in  the  model,  it  is 
strongly  dependent  on  the  distributional  assumptions 
that  characterize  the  noise.  This  paper  addresses  the 
derivation  of  a  filtering  technique  that  is  resistant  to 
heavy-tailed  noise. 


3  The  Minimcix  Description 
Length  (MMDL)  Criterion 

Following  Huber  [6],  we  assume  that  the  noise  dis¬ 
tribution  /  is  a  (possibly)  scaled  version  of  a  dis¬ 
tribution  belonging  to  the  family  of  e-contaminated 
normal  distributions  Vs  —  {(1  —  e)^  +  eG  ■.  G  £  V], 
where  #  is  the  standard  normal  distribution,  V  is  the 
set  of  all  distribution  functions,  and  e  G  (0, 1)  is  the 
known  fraction  of  contamination.  We  cast  our  signal 
estimation  problem  as  one  of  location  parameter  es¬ 
timation,  and  thus  assume  the  estimators  to  be  in  S, 
the  set  of  all  integrable  mappings  from  M  to  R 

As  in  [7],  we  use  the  coding  length  of  the  observation 
in  Equation  3  to  determine  the  optimality  of  the  sig¬ 
nal  estimate.  For  fixed  model  order,  the  expectation 
of  the  MDL  criterion  is  the  entropy  (plus  the  penalty 
term  which  is  independent  of  both  the  distribution 
and  the  functional  form  of  the  estimator).  In  accor¬ 
dance  with  the  minimax  principle,  we  seek  the  least 
favorable  noise  distribution  and  evaluate  the  MDL 
criterion  for  that  distribution.  In  other  words,  we 
solve  a  minimax  problem  where  the  entropy  is  simul¬ 
taneously  maximized  over  all  distributions  in  Vs  and 
minimized  over  all  estimators  in  S. 

The  least  favorable  distribution  in  Vs,  i.e.  the  dis¬ 
tribution  that  maximizes  the  entropy,  is  precisely  the 
same  as  that  found  by  Huber  to  maximize  the  asymp¬ 
totic  variance  (or  equivalently  minimize  the  Fisher 
information).  Generalizing  Huber’s  distribution  to 
perturbations  from  the  zero-mean  normal  distribu¬ 
tion  with  variance  cr^,  we  get: 

Proposition  1.  The  distribution  fn  G  Vs  that  max¬ 
imizes  the  entropy  is 

[  (1  -  £)</.,,  (a)e^ c  <  -a 
/ij(c)  ^  <  (1  -  £)0<.(c)  -a<c<a 

[  (l-e)<?i<.(a)e^(““"+“'^  a<c  (4) 


where  is  the  normal  density  with  variance  and 
a  is  related  to  e  by  the  equation 


(pcria) 

a/a^ 


$.(-a) 


£ 

1  -  £ 


(5) 


Proof:  The  proof  that  fn  maximizes  the  entropy  is 
similar  to  that  of  Huber  for  the  Fisher  information. 
It  can  be  shown  that  the  negentropy  H{f)  =  E[log/] 
is  a  convex  function  of  /,  that  Vs  is  a  convex  set,  and 
that  if  we  define  /a  =  (1  -  A)/if  -b  A/  for  any  f  £  Vs 
and  any  A  G  (0, 1),  then 

|^H(/a)  |a=o  >  0  (6) 

establishing  the  desired  result.  ■ 

Thus,  the  least  favorable  distribution  in  Vs  is  normal 
in  the  center  and  Laplacian  (“double  exponential”)  in 
the  tails.  The  point  where  the  density  switches  from 
Gaussian  to  Laplacian  is  a  function  of  the  fraction 
of  contamination,  larger  fractions  corresponding  to 
smaller  switching  points  and  vice  versa. 

For  a  given  distribution,  the  entropy  is  minimized 
by  the  Maximum  Likelihood  Estimate  (MLE):  since 
the  negentropy  is  the  expectation  of  the  log  likeli¬ 
hood,  it  follows  that  E[log/(C';0MiE(C'))]  is  maxi¬ 
mum  among  all  functions  0  £  S.  Thus,  we  obtain: 

Proposition  2.  Huber’s  distribution  fu,  together 
with  the  MLE  based  on  it,  9h,  result  in  a  Minimax 
Description  Length,  i.e.  they  satisfy  a  saddle-point 
condition. 


Proof:  Using  a  theorem  due  to  Verdii  and  Poor  [8], 
this  can  be  shown  to  be  equivalent  to  proving  that 

/  h{c;e)\og^pp^ldc  =  0(A)  (7) 

J  /a(c;%(c)) 


where  6  is  the  true  value  of  the  parameter  and  is 
the  MLE  based  upon  /a.  Setting  9\  =  9h  -h  A  (A),  it 
can  be  shown  that 


fxjc-Jxjc)) 

/a(c;0h(c)) 


=  1  -f  A(A)Ae  ^5'(c;0ij(c)) 

-  g'nic-jHic)))  +  0(A)  (8) 


where,  since  /  and  fn  are  in  Vs,  they  can  be  repre¬ 
sented  as  f  =  {1  —  e)(l> -h  eg  and  /jy  =  (1 —  £)</) -fsfffy, 
respectively.  Furthermore,  since 


/a(c;^a(c))  =  f'Hic-,9x{c))  +  Xs(^g'{c;9xic)) 

-9h(c;4(c)))  (9) 
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it  follows  that  A(A)  =  0(A)  with  probability  1,  and 
thus, 


f\{c\dH{c)) 


=  o{\) 


proving  the  result. 


(10) 


4  Minimax  Thresholding 

As  illustrated  by  Figure  1,  there  are  two  distinct 
cases:  the  region  where  the  exponent  is  linear,  and 
the  region  when  the  exponent  is  quadratic. 


Figure  1;  Plot  of  the  negative  exponent  vs.  O,  show¬ 
ing  quadratic  and  linear  regions. 


normal  and  robust  thresholds.  The  noisy  signal  (ap¬ 
proximately  OdB  SNR)  in  the  top  graph  consists  of 
two  ramps  with  a  discontinuity  between  them,  onto 
which  is  superimposed  noise  from  a  Gaussian  mix¬ 
ture  with  the  following  components:  with 

probability  1  -  e  =  0.9,  and  j\/'(0,  9<t^)  with  proba¬ 
bility  e  =  0.1.  The  second  graph  shows  a  reconstruc¬ 
tion  using  coefficients  over  4  resolutions  achieved  by 
the  normal  threshold;  the  third  graph  shows  a  sim¬ 
ilar  reconstruction  achieved  by  the  proposed  robust 
threshold.  The  robust  threshold  suppresses  several 
instances  of  impulsive  noise  that  the  normal  thresh¬ 
old  misses. 


Noisy  Signal 
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Case  1  When  logiT  >  the  coefficient  estimate  is 
set  to  zero  if 

^  (alC-i  I  -  y)  <  logiF  (11) 

which  implies  that 

\Ci\<^  +  —\ogK  (12) 

Ql 

2 

Case  2  When  log  K  <  ^ ,  the  coefficient  estimate  is 
set  to  zero  if 

<  logK  (13) 

which  implies  that 

\Ci\<  <7^2  log  R  (14) 

This  is  the  threshold  based  on  the  assumption  of  nor¬ 
mality,  as  proposed  by  [2]  and  [3]. 

A  numerical  example  for  Case  1  appears  in  Figure  2, 
which  shows  a  comparison  of  the  performance  of  the 


Figure  2:  Noisy  ramp  signal,  and  its  classical  and 
robust  reconstructions. 


5  Constrained  Minimax 
Thresholding 

The  procedure  outlined  above  is  somewhat  more  con¬ 
servative  than  the  traditional  threshold  in  deciding 
whether  or  not  a  coefficient  represents  “primarily 
noise”  and  thus  should  be  excluded  from  the  recon¬ 
struction.  It  is  therefore  less  vulnerable  to  noise  that 
is  heavier-tailed  than  normal.  However,  since  it  is 
based  upon  thresholding  from  below  only,  it  does  not 
result  in  an  estimator  whose  error  is  bounded.  Be¬ 
cause  the  coefficients  {Ci}  are  assumed  independent, 
Ci  =  Ci  for  each  Ci  that  exceeds  the  threshold,  and 
there  is  nothing  to  counterbalance  the  influence  of  a 
single  spurious  data  point.  Thus  the  estimation  error 
can  increase  without  bound. 

In  this  section,  we  describe  a  method  that  involves 
thresholding  from  both  above  and  below,  resulting  in 
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bounded  estimation  error.  Specifically,  we  use  a  con¬ 
strained  MLE  when  evaluating  the  MMDL  criterion. 

We  assume  that  the  signal  (uncorrupted  by  noise)  is 
of  bounded  magnitude,  i.e.  that  there  exists  an  a  >  0 
such  that  I  Cf  |<  a  for  all  i.  Then,  it  can  be  shown 
that  since  fn  is  unimodal,  the  constrained  MLE  given 
by 

(  -a  Ci  <  -a 

=  I  Ci  -a<Ci<a  (15) 

1.  a  a  <  Ci 

(where  Ci  is  the  unconstrained  MLE)  maximizes  the 
negentropy  for  fn  subject  to  the  signal  constraint. 

Thus,  we  propose  the  following  scheme,  when  log  K  > 
a?  j2(j'^  and  a  >  (a/2)  -I-  (a^/a)  logiL: 

ro  iflQl^f-b^iogiL 

Ci  =  I  Ci  ^  if  t  +  ^  iog-^  <1  l< 

I  a  sgn(Ci)  if  a  <\Ci\ 

A  numerical  example  appears  in  Figure  3.  In  the 
same  setup  as  the  previous  example,  an  outlier  is  arti¬ 
ficially  induced  at  a  known  location  and  the  absolute 
reconstruction  error  at  that  point  is  plotted  against 
the  outlier  amplitude  for  the  classical,  robust,  and 
bounded-robust  thresholds.  Initially  the  three  errors 
are  nearly  identical.  As  the  amplitude  increases,  the 
classical  threshold  is  crossed  first,  followed  by  the  ro¬ 
bust  and  bounded-robust  thresholds.  Although  the 
robust  error  eventually  catches  up  with  the  classical 
error  and  the  two  continue  to  increase,  the  bounded- 
robust  error  flattens  out.  The  discontinuities  corre¬ 
spond  to  individual  wavelet  coefficients  crossing  the 
thresholds. 


Figure  3:  Absolute  reconstruction  error  vs.  outlier 
amplitude  for  three  thresholds. 


6  Conclusion 

We  proposed  a  Minimax  Description  Length 
(MMDL)  principle  as  the  criterion  of  choice  for 
thresholding  wavelet  coefficients.  We  determined  the 
least  favorable  distribution  in  the  e-contaminated 
normal  family,  which  we  used  to  derive  a  robust 
threshold  that  is  resistant  to  outliers.  We  further 
assumed  that  the  true  signal  has  bounded  ampli¬ 
tude  and  derived  a  thresholding  technique  from  above 
and  below  that  results  in  bounded  estimation  error. 
These  robust  thresholds  yield  denoising  methods  that 
are  less  sensitive  to  heavy-tailed  noise  than  the  tradi¬ 
tional  threshold  based  on  the  assumption  of  normal¬ 
ity. 
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